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Abstract. In this paper we consider circuit synthesis for n-wire linear reversible 
circuits using the C-NOT gate library. These circuits are an important class of re- 
versible circuits with applications to quantum computation. Previous algorithms, 
based on Gaussian elimination and LU-decomposition, yield circuits with O (n^) 
gates in the worst-case. However, an information theoretic bound suggests that it 
may be possible to reduce this to as few as O (;7^/log «) gates. 
We present an algorithm that is optimal up to a multiplicative constant, as well 
as ©(log n) times faster than previous methods. While our results are primarily 
asymptotic, simulation results show that even for relatively small n our algorithm 
is faster and yields more efficient circuits than the standard method. Generically 
our algorithm can be interpreted as a matrix decomposition algorithm, yielding 
an asymptotically efficient decomposition of a binary matrix into a product of 
elementary matrices. 

1 Introduction 

A reversible or information lossless circuit is one that implements a bijective function, 
or loosely, a circuit where the inputs can be recovered from the outputs and all out- 
put values are achievable. A major motivation for studying reversible circuits is the 
emerging field of quantum computation (6l. A quantum circuit implements a unitary 
function, and is therefore reversible. Circuit synthesis for reversible computations is an 
active area of research 121417191 . The goal in circuit synthesis is, given a gate library, to 
synthesize an efficient circuit performing a desired computation. In the quantum con- 
text, the individual gates correspond to physical operations on quantum states called 
qubits, and therefore reducing the number of gates in the synthesis generally leads to a 
more efficient implementation. 

Linear reversible classical circuits form an important sub-class of quantum circuits, 
which can be generated by a single type of gate called a C-NOT gate (see Figure^). 
This gate is an important primitive for quantum computation because it forms a univer- 
sal gate set when augmented with single qubit rotations jSj. Moreover, current quan- 
tum circuit synthesis algorithms can generate circuits with strings of C-NOT gates, and 
therefore more efficient synthesis for these classical linear reversible sub-circuits would 
imply more efficient synthesis for the overall quantum computation. 

In this paper we consider the problem of efficiently synthesizing an arbitrary linear 
reversible circuit on n wires using C-NOT gates. This problem can be mapped to the 
problem of row reducing a nx n binary matrix. Until now the best synthesis methods 
have been based on standard row reduction methods such as Gaussian elimination and 
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Fig. 1. Examples of reversible and irreversible logic gates with truth tables a) AND gate 
b) NOT gate c) C-NOT gate. Both the NOT and C-NOT gates are reversible while the 
AND gate is not. 



LU-decomposition, which yield circuits with 0{n^) gates yj. However, the best lower 
bound leaves open the possibility that synthesis with as few as O («^/log «) gates in 
the worst case may exist f51. 

We present a new synthesis algorithm that meets the lower bound, and is therefore 
asymptotically optimal up to a multiplicative constant. Furthermore, our algorithm is 
also asymptotically faster than previous methods. Simulation results show that the pro- 
posed algorithm outperforms previous methods even for relatively small n. Generically 
our algorithm can be interpreted as a matrix decomposition algorithm, that yields an 
asymptotically efficient elementary matrix decomposition of a binary matrix. General- 
izations to matrices over larger finite fields are straightforward. 



2 Background 

We can represent the action of an «-input ;7z-output logic gate as a function mapping the 
values of the inputs to those of the outputs: / : ^ F'", where / maps each element 
of Fj to an element in F™ . Here F2 is the two-element field, and F" is the set of all 
n-dimensional vectors over this field. A gate is reversible if this function is bijective, 
that is, / is one-to-one and onto. Intuitively, this means that the inputs can be uniquely 
determined from the outputs and all output values are achievable. For example, the AND 
gate (Figure ^) is not reversible since it maps three input values to the same output 
value. The NOT gate (Figure^), on the other hand, is reversible since both possible 
input values yield unique output values, and both possible output values are achievable. 
The controlled-NOT or C-NOT gate, shown in Figure[Q;, is another important reversible 
gate. This gate passes the first input, called the control, through unchanged and inverts 
the second, called the target, if the control is a one. As its truth table shows, this gate 
is reversible since it maps each input vector to a unique output vector and all output 
vectors are achievable. 

A reversible circuit is a directed acyclic combinatorial logic circuit where all gates 
are reversible and are interconnected without fanout f9l. An example of a reversible 
circuit consisting of C-NOT gates is shown in Figure |2] Note that, as is the case for 
reversible gates, the function computed by a reversible circuit is bijective. 
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Fig. 2. Reversible circuit example. 



We say a circuit or gate, computing the function /, is linear if f{xi ©X2) = f{xi ) ® 
f{x2) for all JCi,X2 £ Fj, where © is the bitwise XOR operation. The C-NOT gate is an 
example of a linear gate; 



/([0 0])©/(x)=/(x) 
/(x)®/(x)=/([0 0]) 



/([0 1])©/([1 0])=/([l 1]) 
/([01])©/([1 1])=/([1 0]). 



/([I 0])©/([l 1])=/([0 1]) 



The action of any linear reversible circuit on n wires can be represented by a linear 
transformation over F2. In particular, we can represent the action of the circuit as mul- 
tiplication by a non-singular n x « matrix A with elements in F2: 

Ax^y, 

where x and y are n-dimensional vectors representing the values of the input and out- 
put bits respectively. Using this representation, the action of a C-NOT gate corresponds 
to multiplication by an elementary matrix, which is the identity matrix with one off- 
diagonal entry set to one. Multiplication by an elementary matrix performs a row op- 
eration, the addition of one row of a matrix or vector to another Applying a series of 
C-NOT gates corresponds to performing a series of these row operations on the input 
vector or equivalently to multiplying it by a series of elementary matrices. For example, 
the Unear transform computed by the circuit in Figure|2lis given by 





G5 


G4 


G3 


G2 


Gi 






1000 




1100 




1000 




1000 




1000 




1000 




"10 10" 


100 




100 




110 




100 




100 




1100 




00 10 


00 10 




00 10 




00 10 




110 




00 10 




00 10 




1110 


00 11 




000 1 




000 1 




000 1 




00 11 




000 1 




110 1 



We can use the matrix notation to count the number of different «-input linear re- 
versible transformations. In order for the transformation to be reversible, its matrix must 
be non-singular, in other words, all nontrivial sum of the rows should be non-zero. There 
are 2" — 1 possible choices for the first row, all vectors except for the all zeros vector. 
There are 2" — 2 possible choices for the second row, since it cannot be the equal to the 
first row or the all zeros vector In general, there are 2" — 2'^' possible choices for the 
ith row, since it cannot be any of the 2'^' linear combinations of the previous / — 1 rows 



(otherwise the matrix would be singular). Therefore there are 



n(2"-2') 

1=0 

unique «-input linear reversible transformations. 

Since any non-singular matrix A can be reduced to the identity matrix using row 
operations, we can write A as a product of elementary matrices. Therefore, any linear 
reversible function can be be synthesized from C-NOT gates. Moreover, the problem 
of C-NOT circuit synthesis is equivalent to the problem of row reduction of a matrix A 
representing the linear reversible function: any synthesis of the circuit can be written as 
a product of elementary matrices equal to A and any such product yields a synthesis. The 
length of the synthesized circuit is given by the number of elementary matrices in the 
product. Standard Gaussian elimination and LU-decomposition based methods requires 
0{rp-) gates in the worst-case |3|. However, the best lower bound is only £2 (n"/log «) 
gates 0. 

Lemma 1 (Lower Bound). There are n-bit linear reversible transformation that can- 
not be synthesized using fewer than £2(«^/log «) C-NOT gates. 

Proof Let d be the maximum number of C-NOT gates needed to synthesize any linear 
reversible function on n wires. The number of different C-NOT gates which can act on 
n wires is n (« — 1 ) . Therefore the number of unique C-NOT circuit with no more than d 

gates must be no more than — n + l)'', where we have included a do-nothing NOP 
gate in addition to the n- — n C-NOT gates to account for circuits with fewer than d 
gates. Since the number of circuits with no more than d C-NOT gates must be greater 
than the number of unique linear reversible function on n wires, we have the inequality 

(„2_„ + l)^'>p|(2"-2') >2"(«-i). (1) 

(=0 

Taking the log of both the left and right sides of the equations gives 

n(n— l)log2 n^ — n Q,( '^^ 

~log{n^ — n+l) log2(n2 — n+1) Vlog« 

□ 

This lemma suggests a synthesis more efficient than standard Gaussian elimination may 
be possible. The multiplicative constant in this lower bound is 1/2 (assuming logs are 
taken base 2). 

3 Efficient Synthesis 

In this section we present our synthesis algorithm, which achieves the lower bound 
given in the previous section. In Gaussian elimination, row operations are used to place 
ones on the diagonal of the matrix and to eliminate any remaining ones. One row oper- 
ation is required for each entry in the matrix that is targeted. Since there are n^ matrix 




entries, 0{n^) row operation are required in the worst case. If instead we group entries 
together and use single row operations to change these groups, we can reduce the num- 
ber of row operation required, and therefore the number of gates needed to synthesize 
the circuit. 

The basic idea is as follows. We first partition the columns of the nxn matrix into 
sections of no more than m columns each. We call the entries in a particular row and 
section a sub-row. For each section we use row operations to eliminate sub-row patterns 
that repeat in that section. This leaves relatively few (< 2"') non-zero sub-rows in the 
section. These remaining entries are handled using Gaussian elimination. If m is small 
enough (< log2 n), most of the row operations result from the first step, which requires 
a factor of m fewer row operations than full Gaussian elimination. As with the Gaussian 
elimination based method, our algorithm is applied in two steps; first the matrix is 
reduced to an upper triangular matrix, the resulting matrix is transposed, and then the 
process is repeated to reduce it to the identity. Detailed pseudo-code for our algorithm 
is shown on the next page. The following example illustrates our algorithm for a 6-wire 
linear reversible circuit. 
1) Choose m = 2 and partition matrix. 
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2) (Step A - section 1) Eliminate duplicate sub-rows. 
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3) (Step B - section 1, column 1) One already on diagonal. 

4) (Step C - section 1, column 1) Remove remaining ones in column below diagonal. 
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3) (Step B - section 1, column 1) One already on diagonal. 



Algorithm 1: Efflcient C-NOT Synthesis 



[circuit] = CNOT.Synth (A, n, m) 

{ 

// synthesize lower/upper triangular part 

[A, circuit.l] = LwrXNOLSynth (A, n, m) 
A = transpose (A) ; 

[A, cirouit-u] = Lwr_CNOT_Synth (A, n, m) 
// combine lower/upper triangular synthesis 

switch control/target of C-NOT gates in circuit_u; 
circuit = [reverse (circuit_u) I circuit.l]; 

} 



[A, circuit] = Lwr_CNOT_Synth (A, n, m) 

{ 

circuit = [ ] ; 

for (sec=l; sec<=ceil (n/m) ; sec++) // Iterate over column sections 
{ 

// remove duplicate sub-rows in section sec 
for (i = 0; i<2"'; i + + ) 

patt[i] = -1; //marker for first positions of sub-row patterns 

for (row_ind= ( sec-1 ) *m; row_ind<n; row_ind++) 
{ 

sub-row_patt = A[row_ind, ( sec-1 ) *m : sec*m-l ] ; 
// if first copy of pattern save otherwise remove 
if (patt [ sub-row_patt ] == -1) 

patt [ sub-row_patt ] = row.ind; 
else 
{ 

A[row_ind, :] += A [patt [ sub-row_patt ] , : ] ; 
Step A circuit = [C-NOT (patt [ sub-row_patt ], row.ind) I circuit]; 

} 

} 

// use Gaussian elimination for remaining entries in column section 

for (col_ind= ( sec-1 ) *m; col_ind<sec*m-l ; col_ind++) 
{ 

// check for 1 on diagonal 
diag_one = 1; 

if (A[col.ind, col-ind] == 0) 

diag.one = 0; 
// remove ones in rows below coljnd 

for ( row_ind=col_ind+l ; row_ind<n; row_ind+ + ) 
{ 

if (A [ row.ind, col_ind] == 1) 
{ 

if (diag.one == 0) 
{ 

A[col_ind, :] += A [ row_ind, : ] ; 
Step B circuit = [C-NOT (row_ind, col_ind) I circuit]; 

diag_one = 1; 

} 

A[row_ind, : ] += A[col_ind, : ] ; 
Step C circuit = [C-NOT (col_ind, row_ind) | circuit]; 

} } } } } 



4) (Step C - section 1, column 1) Remove remaining ones in column below diagonal. 
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5) (Step A - section 2) Eliminate duplicate sub-rows below row 2. 
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6) (Step B - section 2, column 3) Place one on diagonal. 
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7) (Step C - section 2, column 3) Remove remaining ones in column below diagonal. 
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8) Matrix is now upper triangular. Transpose and continue. 
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9) (Step A - section 1) Eliminate duplicate sub-rows. 
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10) (Step B - section 1, column 1) Because matrix is triangular and non-singular there 
will always be ones on the diagonal. 

11) (Step C - section 1, column 1) Remove remaining ones in column. 
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12) (Step C - section 1, column 2) Remove remaining ones in column. 
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13) (Step A - section 2) Eliminate duplicate sub-rows. 
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14) (Step C - section 2, column 1) Remove remaining ones in column. 
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Fig. 3. Synthesized C-NOT circuit example. The gates in the right and left boxes corre- 
spond to row operations before and after the transpose step respectively. Those in the 
left box are in the same order the row operations were applied and their controls and 
targets are switched. The gates in the right box are in the reverse order that the row 
operations were applied. 



15) (Step C - section 2, column 2) Remove remaining ones in column. 
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16) (Step C - section 3, column 1) Remove remaining ones in column. 
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The synthesized circuit is specified by the row operations and is shown in Figure|3l 

In general, the length of the synthesized circuit is given by the number of row oper- 
ations used in the algorithm. By accounting for the maximum number of row operations 
in each step, we can calculate an upper bound on the maximum number of gates that 
could be required in synthesizing an «-wire linear reversible circuit. C-NOT gates are 
added in the steps marked Step A-C in the algorithm. Step A is used to eliminate the 
duplicates in the subsections. It is called fewer than n + m times per section (combined 
for the upper/lower triangular stages of the algorithm), giving a total of no more than 



{n + m) ■ [n/in] gates. Step B is used to place ones on the diagonal. It can be called no 
more than n times. Step C is used to remove the ones remaining after all duplicate sub- 
rows have been cleared. Since there are only 2"' m-hit words, there can be at most as 
many non-zero sub-rows below the m x m sub-matrix on the diagonal. Therefore, Step 
C is called fewer than m ■ (2™ + m) times per section, or fewer than 2 \n / in] m ■ (2'" + m) 
times in all. Adding these up we have 



total row ops < (« + m) ■ 

„2 









' n 1 




PI 


+ n- 






m 


m I 






m 





m-(2"' + m) 



< |-n + n + OT + n + 2n2'" + 2nm + 2m2"' + Im-^ 

m 



If m = oclog2 n, 



(3) 

(4) 



total row ops < 



alog2 n 



3n + alog2 « + 2n'+" + 2«alog2 n 
2alog2 n • n" + 2 (alog2 nf . 



(5) 



If a < 1 , the first term dominates as n gets large. Therefore the number of row operations 
is 0(n^/log n). Combining this result with Lemma^ we have the following theorem. 

Theorem 1. The worst-case length of an n-wire C-NOT circuit is 0(«^/log n) gates. 

In Equation|5] a can be chosen to be arbitrarily close to 1 . In the limit, the multiplicative 
constant in the (9(«'/log «) expression becomes 1 (assuming logs are taken base 2). By 
contrast, the multiplicative constant in the lower bound in Lemma^is 1/2. 

This algorithm, in addition to generating more efficient circuits than the standard 
method, is also asymptotically more efficient in terms of run time. The execution time 
of the algorithm is dominated by the row operations on the matrix, which are each 0{n). 
Therefore the overall execution time is (9(«^/log«) compared to 0{n^) for standard 
Gaussian elimination |8j p. 42]. 

Our algorithm is closely related to Kronrod's Algorithm (also known as "The Four 
Russians' Algorithm") for construction of the transitive closure of a graph Q]. One im- 
portant difference between the two is that in their case the goal was a fast algorithm for 
their application, which is only of secondary concern for our application. Our primary 
goal is an algorithm that produces an efficient circuit synthesis. Generically, our algo- 
rithm can be interpreted as producing an efficient elementary matrix decomposition of 
a binary matrix. 



4 Simulation Results 

Though Algorithm 1 is asymptotically optimal, it would be of interest to know how 
large n must be before the algorithm begins to outperform standard Gaussian elimi- 
nation. For this purpose we have synthesized linear reversible circuits using both our 
method and Gaussian elimination for randomly generated non-singular 0-1 matrices. 
The results of these simulations are summarized in Figure |3 Our algorithm shows an 
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Fig. 4. Performance of Algorithm 1 vs. Gaussian elimination on randomly generated 
linear reversible functions. Each point corresponds to the average length of the cir- 
cuit generated for 100 randomly generated matrices. The x-axis specifies «, the num- 
ber of inputs/outputs of the linear reversible circuit, and the y-axis specifies the av- 
erage number of gates in the circuit synthesis. For Algorithm 1, m was chosen to be 
round ((log2 n)/2). 



improvement over Gaussian elimination for n as small as 8. The length of the circuit 
synthesis produced by Algorithm 1 is dependent on the choice m, the size of the col- 
umn sections. Here we have somewhat arbitrarily chosen m ~ round((log2 «)/2). The 
performance for some values of n could be significantly improved by optimizing this 
choice. This would also smooth out the performance curve in Figure0]for Algorithm 1. 



5 Conclusions and Future Work 

We have given an algorithm for linear reversible circuit synthesis that is asymptotically 
optimal in the worst-case. We show that the algorithm is also asymptotically faster than 
current methods. While our results are primarily asymptotic, simulation results show 
that even in the finite case our algorithm outperforms the current synthesis method. 
Applications of our work include circuit synthesis for quantum circuits. 

While the primary motivations for the synthesis method we have given here are to 
provide an asymptotic bound on circuit complexity and a practical method to synthesize 
efficient circuits, another application is to bounds on circuit complexity for the finite 
case. In particular, we can use our method to determine an upper bound on the maximum 



number of gates required to synthesize any n wire C-NOT circuit. For this apphcation 
the particular partitioning of the columns can be very important. For example, much 
better bounds can be determined if the size of the sections are a function of the location 
of the section in the matrix. Sections to the left have more rows below the diagonal 
and therefore should be larger than sections towards the right of the matrix which have 
fewer rows below the diagonal. An ongoing area of work is determining optimal column 
partitioning methods. 

Our algorithm basically yields an efficient decomposition for matrices with ele- 
ments in F2, and can be generalized in a straightforward manner for matrices over any 
finite field. The asymptotic size of the generalized decomposition is (9(«^/log|f | «), 
where \F\ is the order of the finite field. Our algorithm, particularly in this generalized 
form, is quite generic and may lend itself to a wide range of other applications. Re- 
lated algorithms have applications in finding the transitive closure of a graph, binary 
matrix multiplication, and pattern matching. 

A major area of future work is extending our results to more general reversible cir- 
cuits, particularly quantum circuits. Currently, there is an asymptotic gap between the 
best upper and lower bounds on the worst-case circuit complexity both for general clas- 
sical reversible circuits and quantum circuits. The gap for classical reversible circuits 
is the same logarithmic factor that previously existed for linear reversible circuits ||9]. 
which suggests it may be possible to extend our methods to this problem. 
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